Journal article
Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data
TS Rask, B Petersen, DS Chen, KP Day, AG Pedersen
BMC Bioinformatics | BIOMED CENTRAL LTD | Published : 2016
Abstract
Background: Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. Results: The new basecalling method described here, named Multipass, implements a proba..
View full abstractGrants
Awarded by National Institutes of Health
Funding Acknowledgements
This work was supported by the National Institute of Allergy and Infectious Disease, National Institutes of Health [grant number R01-AI084156]; the Fogarty International Center at National Institutes of Health [Program on the Ecology and Evolution of Infectious Diseases, grant number R01-TW009670]; and the Lundbeck Foundation [grant number R48-A4847].